Ricardo Baeza - Yates
نویسندگان
چکیده
The XML Fragment model offers a convenient formalism for querying XML collections "by example", that is, by formulating the query as a piece of XML that expresses the user's needs. This allows relevant results to be returned either as full documents or as XML Fragments, using a simple extension of the vector space model for ranking. In this work, we investigate extending this model to text analytics applications where semantic tags (e.g., names, entities, relations, etc.) are automatically generated to annotate the underlying text. Each type of tag can easily be represented as an XML element, but the spans of these tags often cross over each other, which, of course, is not allowed by the XML DOM structure. We discuss how our original XML Fragments model can be extended to query annotated documents with possibly overlapping annotations and illustrate our approach with examples of queries over annotated documents generated in the context of IBM’s Unstructured Information Management Architecture (UIMA) framework.
منابع مشابه
Diseñemos Todo de Nuevo: Reflexiones sobre la Computación y su Enseñanza (Invited paper)
What and how to teach are the fundamental questions in our activities as lecturers. This paper presents my view on these questions related to computer science, and illustrates a critical and constructive analysis and its implications in the education, including two partial answers to these questions. REVISTA COLOMBIANA DE COMPUTACIÓN Volumen 1, número 1 Págs. 7-28 Ricardo Baeza Yates 2
متن کامل